Overview

Dataset statistics

Number of variables10
Number of observations768
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory60.1 KiB
Average record size in memory80.2 B

Variable types

Numeric6
Categorical4

Warnings

relative compactness is highly correlated with surface area and 4 other fieldsHigh correlation
surface area is highly correlated with relative compactness and 4 other fieldsHigh correlation
roof area is highly correlated with relative compactness and 4 other fieldsHigh correlation
overall height is highly correlated with relative compactness and 4 other fieldsHigh correlation
heating load is highly correlated with relative compactness and 4 other fieldsHigh correlation
cooling load is highly correlated with relative compactness and 4 other fieldsHigh correlation
relative compactness is highly correlated with surface area and 4 other fieldsHigh correlation
surface area is highly correlated with relative compactness and 4 other fieldsHigh correlation
roof area is highly correlated with relative compactness and 4 other fieldsHigh correlation
overall height is highly correlated with relative compactness and 4 other fieldsHigh correlation
heating load is highly correlated with relative compactness and 4 other fieldsHigh correlation
cooling load is highly correlated with relative compactness and 4 other fieldsHigh correlation
relative compactness is highly correlated with surface area and 2 other fieldsHigh correlation
surface area is highly correlated with relative compactness and 2 other fieldsHigh correlation
roof area is highly correlated with relative compactness and 4 other fieldsHigh correlation
overall height is highly correlated with relative compactness and 4 other fieldsHigh correlation
heating load is highly correlated with roof area and 2 other fieldsHigh correlation
cooling load is highly correlated with roof area and 2 other fieldsHigh correlation
glazing area distribution is highly correlated with glazing areaHigh correlation
overall height is highly correlated with surface area and 5 other fieldsHigh correlation
surface area is highly correlated with overall height and 5 other fieldsHigh correlation
roof area is highly correlated with overall height and 5 other fieldsHigh correlation
heating load is highly correlated with overall height and 6 other fieldsHigh correlation
wall area is highly correlated with overall height and 5 other fieldsHigh correlation
glazing area is highly correlated with glazing area distribution and 2 other fieldsHigh correlation
cooling load is highly correlated with overall height and 6 other fieldsHigh correlation
relative compactness is highly correlated with overall height and 5 other fieldsHigh correlation
roof area is highly correlated with overall heightHigh correlation
overall height is highly correlated with roof areaHigh correlation
overall height is uniformly distributed Uniform
orientation is uniformly distributed Uniform
glazing area distribution has 48 (6.2%) zeros Zeros

Reproduction

Analysis started2021-09-07 07:21:09.548927
Analysis finished2021-09-07 07:21:26.089497
Duration16.54 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

relative compactness
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct12
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.7641666681
Minimum0.6200000048
Maximum0.9800000191
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2021-09-07T12:51:26.243547image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.6200000048
5-th percentile0.6200000048
Q10.6825000048
median0.75
Q30.8299999982
95-th percentile0.9800000191
Maximum0.9800000191
Range0.3600000143
Interquartile range (IQR)0.1474999934

Descriptive statistics

Standard deviation0.1057774774
Coefficient of variation (CV)0.1384219985
Kurtosis-0.7065673466
Mean0.7641666681
Median Absolute Deviation (MAD)0.07999998331
Skewness0.495512586
Sum586.8800011
Variance0.01118887472
MonotonicityNot monotonic
2021-09-07T12:51:26.540501image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
0.790000021564
8.3%
0.620000004864
8.3%
0.660000026264
8.3%
0.740000009564
8.3%
0.819999992864
8.3%
0.860000014364
8.3%
0.899999976264
8.3%
0.689999997664
8.3%
0.980000019164
8.3%
0.639999985764
8.3%
Other values (2)128
16.7%
ValueCountFrequency (%)
0.620000004864
8.3%
0.639999985764
8.3%
0.660000026264
8.3%
0.689999997664
8.3%
0.709999978564
8.3%
0.740000009564
8.3%
0.759999990564
8.3%
0.790000021564
8.3%
0.819999992864
8.3%
0.860000014364
8.3%
ValueCountFrequency (%)
0.980000019164
8.3%
0.899999976264
8.3%
0.860000014364
8.3%
0.819999992864
8.3%
0.790000021564
8.3%
0.759999990564
8.3%
0.740000009564
8.3%
0.709999978564
8.3%
0.689999997664
8.3%
0.660000026264
8.3%

surface area
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct12
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean671.7083333
Minimum514.5
Maximum808.5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2021-09-07T12:51:26.833499image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum514.5
5-th percentile514.5
Q1606.375
median673.75
Q3741.125
95-th percentile808.5
Maximum808.5
Range294
Interquartile range (IQR)134.75

Descriptive statistics

Standard deviation88.08611606
Coefficient of variation (CV)0.1311374471
Kurtosis-1.059454167
Mean671.7083333
Median Absolute Deviation (MAD)73.5
Skewness-0.1251308847
Sum515872
Variance7759.163842
MonotonicityNot monotonic
2021-09-07T12:51:27.083546image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
563.564
8.3%
73564
8.3%
68664
8.3%
63764
8.3%
808.564
8.3%
514.564
8.3%
759.564
8.3%
710.564
8.3%
661.564
8.3%
612.564
8.3%
Other values (2)128
16.7%
ValueCountFrequency (%)
514.564
8.3%
563.564
8.3%
58864
8.3%
612.564
8.3%
63764
8.3%
661.564
8.3%
68664
8.3%
710.564
8.3%
73564
8.3%
759.564
8.3%
ValueCountFrequency (%)
808.564
8.3%
78464
8.3%
759.564
8.3%
73564
8.3%
710.564
8.3%
68664
8.3%
661.564
8.3%
63764
8.3%
612.564
8.3%
58864
8.3%

wall area
Real number (ℝ≥0)

HIGH CORRELATION

Distinct7
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean318.5
Minimum245
Maximum416.5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2021-09-07T12:51:27.342546image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum245
5-th percentile245
Q1294
median318.5
Q3343
95-th percentile416.5
Maximum416.5
Range171.5
Interquartile range (IQR)49

Descriptive statistics

Standard deviation43.62648144
Coefficient of variation (CV)0.136974824
Kurtosis0.11659327
Mean318.5
Median Absolute Deviation (MAD)24.5
Skewness0.5334174897
Sum244608
Variance1903.269883
MonotonicityNot monotonic
2021-09-07T12:51:27.585500image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
318.5192
25.0%
294192
25.0%
343128
16.7%
367.564
 
8.3%
24564
 
8.3%
269.564
 
8.3%
416.564
 
8.3%
ValueCountFrequency (%)
24564
 
8.3%
269.564
 
8.3%
294192
25.0%
318.5192
25.0%
343128
16.7%
367.564
 
8.3%
416.564
 
8.3%
ValueCountFrequency (%)
416.564
 
8.3%
367.564
 
8.3%
343128
16.7%
318.5192
25.0%
294192
25.0%
269.564
 
8.3%
24564
 
8.3%

roof area
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size6.1 KiB
220.5
384 
147.0
192 
122.5
128 
110.25
64 

Length

Max length6
Median length5
Mean length5.083333333
Min length5

Characters and Unicode

Total characters3904
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row110.25
2nd row110.25
3rd row110.25
4th row110.25
5th row122.5

Common Values

ValueCountFrequency (%)
220.5384
50.0%
147.0192
25.0%
122.5128
 
16.7%
110.2564
 
8.3%

Length

2021-09-07T12:51:28.335497image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-07T12:51:28.567503image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
220.5384
50.0%
147.0192
25.0%
122.5128
 
16.7%
110.2564
 
8.3%

Most occurring characters

ValueCountFrequency (%)
21088
27.9%
.768
19.7%
0640
16.4%
5576
14.8%
1448
11.5%
4192
 
4.9%
7192
 
4.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3136
80.3%
Other Punctuation768
 
19.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
21088
34.7%
0640
20.4%
5576
18.4%
1448
14.3%
4192
 
6.1%
7192
 
6.1%
Other Punctuation
ValueCountFrequency (%)
.768
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3904
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
21088
27.9%
.768
19.7%
0640
16.4%
5576
14.8%
1448
11.5%
4192
 
4.9%
7192
 
4.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII3904
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
21088
27.9%
.768
19.7%
0640
16.4%
5576
14.8%
1448
11.5%
4192
 
4.9%
7192
 
4.9%

overall height
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIFORM

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size6.1 KiB
3.5
384 
7.0
384 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2304
Distinct characters5
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row7.0
2nd row7.0
3rd row7.0
4th row7.0
5th row7.0

Common Values

ValueCountFrequency (%)
3.5384
50.0%
7.0384
50.0%

Length

2021-09-07T12:51:29.205507image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-07T12:51:29.406550image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
7.0384
50.0%
3.5384
50.0%

Most occurring characters

ValueCountFrequency (%)
.768
33.3%
7384
16.7%
0384
16.7%
3384
16.7%
5384
16.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1536
66.7%
Other Punctuation768
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
7384
25.0%
0384
25.0%
3384
25.0%
5384
25.0%
Other Punctuation
ValueCountFrequency (%)
.768
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2304
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
.768
33.3%
7384
16.7%
0384
16.7%
3384
16.7%
5384
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII2304
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
.768
33.3%
7384
16.7%
0384
16.7%
3384
16.7%
5384
16.7%

orientation
Categorical

UNIFORM

Distinct4
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size6.1 KiB
2
192 
5
192 
3
192 
4
192 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters768
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row3
3rd row4
4th row5
5th row2

Common Values

ValueCountFrequency (%)
2192
25.0%
5192
25.0%
3192
25.0%
4192
25.0%

Length

2021-09-07T12:51:29.938529image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-07T12:51:30.168543image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
4192
25.0%
3192
25.0%
5192
25.0%
2192
25.0%

Most occurring characters

ValueCountFrequency (%)
2192
25.0%
3192
25.0%
4192
25.0%
5192
25.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number768
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2192
25.0%
3192
25.0%
4192
25.0%
5192
25.0%

Most occurring scripts

ValueCountFrequency (%)
Common768
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2192
25.0%
3192
25.0%
4192
25.0%
5192
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII768
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2192
25.0%
3192
25.0%
4192
25.0%
5192
25.0%

glazing area
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size6.1 KiB
0.10000000149011612
240 
0.25
240 
0.4000000059604645
240 
0.0
48 

Length

Max length19
Median length18
Mean length13
Min length3

Characters and Unicode

Total characters9984
Distinct characters8
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.10000000149011612240
31.2%
0.25240
31.2%
0.4000000059604645240
31.2%
0.048
 
6.2%

Length

2021-09-07T12:51:31.315547image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-07T12:51:31.522556image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
0.4000000059604645240
31.2%
0.25240
31.2%
0.10000000149011612240
31.2%
0.048
 
6.2%

Most occurring characters

ValueCountFrequency (%)
04656
46.6%
11200
 
12.0%
4960
 
9.6%
.768
 
7.7%
6720
 
7.2%
5720
 
7.2%
9480
 
4.8%
2480
 
4.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number9216
92.3%
Other Punctuation768
 
7.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
04656
50.5%
11200
 
13.0%
4960
 
10.4%
6720
 
7.8%
5720
 
7.8%
9480
 
5.2%
2480
 
5.2%
Other Punctuation
ValueCountFrequency (%)
.768
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common9984
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
04656
46.6%
11200
 
12.0%
4960
 
9.6%
.768
 
7.7%
6720
 
7.2%
5720
 
7.2%
9480
 
4.8%
2480
 
4.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII9984
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
04656
46.6%
11200
 
12.0%
4960
 
9.6%
.768
 
7.7%
6720
 
7.2%
5720
 
7.2%
9480
 
4.8%
2480
 
4.8%

glazing area distribution
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct6
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.8125
Minimum0
Maximum5
Zeros48
Zeros (%)6.2%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2021-09-07T12:51:31.751533image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11.75
median3
Q34
95-th percentile5
Maximum5
Range5
Interquartile range (IQR)2.25

Descriptive statistics

Standard deviation1.550959664
Coefficient of variation (CV)0.5514523251
Kurtosis-1.148708815
Mean2.8125
Median Absolute Deviation (MAD)1
Skewness-0.08868917544
Sum2160
Variance2.40547588
MonotonicityNot monotonic
2021-09-07T12:51:32.025544image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
5144
18.8%
4144
18.8%
3144
18.8%
2144
18.8%
1144
18.8%
048
 
6.2%
ValueCountFrequency (%)
048
 
6.2%
1144
18.8%
2144
18.8%
3144
18.8%
4144
18.8%
5144
18.8%
ValueCountFrequency (%)
5144
18.8%
4144
18.8%
3144
18.8%
2144
18.8%
1144
18.8%
048
 
6.2%

heating load
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct586
Distinct (%)76.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22.30720054
Minimum6.010000229
Maximum43.09999847
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2021-09-07T12:51:32.359543image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum6.010000229
5-th percentile10.46350012
Q112.99250007
median18.94999981
Q331.66750002
95-th percentile39.86000061
Maximum43.09999847
Range37.08999825
Interquartile range (IQR)18.67499995

Descriptive statistics

Standard deviation10.09019575
Coefficient of variation (CV)0.4523290913
Kurtosis-1.245571862
Mean22.30720054
Median Absolute Deviation (MAD)7.514999866
Skewness0.3604488848
Sum17131.93002
Variance101.8120503
MonotonicityNot monotonic
2021-09-07T12:51:32.736500image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
15.159999856
 
0.8%
135
 
0.7%
15.229999544
 
0.5%
14.600000384
 
0.5%
32.310001374
 
0.5%
28.149999624
 
0.5%
12.930000314
 
0.5%
15.090000154
 
0.5%
15.550000194
 
0.5%
10.680000314
 
0.5%
Other values (576)725
94.4%
ValueCountFrequency (%)
6.0100002291
0.1%
6.0399999621
0.1%
6.0500001911
0.1%
6.0700001721
0.1%
6.3699998862
0.3%
6.4000000952
0.3%
6.7699999811
0.1%
6.7899999621
0.1%
6.8099999431
0.1%
6.8499999051
0.1%
ValueCountFrequency (%)
43.099998471
0.1%
42.959999081
0.1%
42.770000461
0.1%
42.740001681
0.1%
42.619998931
0.1%
42.51
0.1%
42.490001681
0.1%
42.110000611
0.1%
42.080001831
0.1%
41.959999081
0.1%

cooling load
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct636
Distinct (%)82.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean24.58776039
Minimum10.89999962
Maximum48.02999878
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2021-09-07T12:51:33.210501image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum10.89999962
5-th percentile13.61750011
Q115.62000012
median22.07999992
Q333.13250065
95-th percentile40.03699837
Maximum48.02999878
Range37.12999916
Interquartile range (IQR)17.51250052

Descriptive statistics

Standard deviation9.513305495
Coefficient of variation (CV)0.3869122419
Kurtosis-1.147190359
Mean24.58776039
Median Absolute Deviation (MAD)7.540000439
Skewness0.3959924526
Sum18883.39998
Variance90.50298144
MonotonicityNot monotonic
2021-09-07T12:51:33.591547image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
14.279999734
 
0.5%
14.270000464
 
0.5%
29.790000924
 
0.5%
17.200000764
 
0.5%
21.329999924
 
0.5%
14.670000083
 
0.4%
15.850000383
 
0.4%
32.830001833
 
0.4%
15.439999583
 
0.4%
13.720000273
 
0.4%
Other values (626)733
95.4%
ValueCountFrequency (%)
10.899999621
0.1%
10.939999581
0.1%
11.170000081
0.1%
11.189999581
0.1%
11.270000461
0.1%
11.289999961
0.1%
11.670000081
0.1%
11.720000271
0.1%
11.729999541
0.1%
11.739999771
0.1%
ValueCountFrequency (%)
48.029998781
0.1%
47.590000151
0.1%
47.009998321
0.1%
46.939998631
0.1%
46.439998631
0.1%
46.229999541
0.1%
45.970001221
0.1%
45.590000151
0.1%
45.520000461
0.1%
45.479999541
0.1%

Interactions

2021-09-07T12:51:10.560467image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:10.974471image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:11.415468image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:11.908464image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:12.300465image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:12.686473image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:13.148515image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:13.550469image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:13.922463image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:14.297471image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:14.684511image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:15.043513image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:15.490467image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:15.903468image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:16.324471image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:16.765471image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:17.125471image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:17.516466image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:17.956463image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:18.321514image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:18.705463image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:19.123464image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:19.498474image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:19.872461image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:20.243471image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:20.629472image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:21.093467image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:21.488471image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:21.864465image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:22.247470image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:22.664509image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:23.047471image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:23.406471image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:23.844464image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:24.202471image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-07T12:51:24.564463image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-09-07T12:51:33.970501image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-09-07T12:51:34.528499image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-09-07T12:51:35.163546image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-09-07T12:51:35.768436image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-09-07T12:51:36.333371image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-09-07T12:51:25.157471image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-09-07T12:51:25.820513image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

relative compactnesssurface areawall arearoof areaoverall heightorientationglazing areaglazing area distributionheating loadcooling load
00.98514.5294.0110.257.020.0015.55000021.330000
10.98514.5294.0110.257.030.0015.55000021.330000
20.98514.5294.0110.257.040.0015.55000021.330000
30.98514.5294.0110.257.050.0015.55000021.330000
40.90563.5318.5122.507.020.0020.84000028.280001
50.90563.5318.5122.507.030.0021.45999925.379999
60.90563.5318.5122.507.040.0020.70999925.160000
70.90563.5318.5122.507.050.0019.68000029.600000
80.86588.0294.0147.007.020.0019.50000027.299999
90.86588.0294.0147.007.030.0019.95000121.969999

Last rows

relative compactnesssurface areawall arearoof areaoverall heightorientationglazing areaglazing area distributionheating loadcooling load
7580.66759.5318.5220.53.540.4514.92000017.549999
7590.66759.5318.5220.53.550.4515.16000018.059999
7600.64784.0343.0220.53.520.4517.69000120.820000
7610.64784.0343.0220.53.530.4518.19000120.209999
7620.64784.0343.0220.53.540.4518.16000020.709999
7630.64784.0343.0220.53.550.4517.87999921.400000
7640.62808.5367.5220.53.520.4516.54000116.879999
7650.62808.5367.5220.53.530.4516.44000117.110001
7660.62808.5367.5220.53.540.4516.48000016.610001
7670.62808.5367.5220.53.550.4516.63999916.030001